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Abstract 

Image indexing is the process of image retrieval from 
databases of images or videos based on their contents. 
Specifically, histogram-based algorithms are considered 
to be effective for color image indexing. We suggest a 
new method of color space quantization in the CIELUV 
color space, named weighted LUV quantization. With this 
method, each bin in the LUV space has a different weighting 
factor, which is applied to the histogram intersection. 
The weighted LUV histogram intersection provides the 
advantage of perceptual uniformity of the CIELUV color 
space. An additional advantage is the consideration of 
perceptual sensitivity to more saturated colors by the use 
of a weighting factor. 



1. Introduction 

The image indexing process must satisfy the automated 
extraction of features, efficient indexing and the effective 
retrieval of images within a database. Color features have 
proven to be efficient in disc riminating between relevant 
and non-relevant images. Moreover, histogram-based 
techniques have been widely studied and are considered 
to be effective for color image indexing [9]. The key 
issue of histogram-based techniques is the selection of an 
appropriate color space and the quantization of the selected 
color space. 

In this paper, we suggest a quantization and histogram 
matching algorithm in the CIELUV color space. In the 
proposed quantization, the LUV space, surrounding all of 
the CIELUV color space defined by conversion of RGB 
color space, is subdivided into uniform sized bins along 

"This work was supported in part by the Automation Research Center 
(ARC) designated by KOSEF. 



with each axis. Each bin in the LUV space has a different 
weighting factor according to the volume of the CIELUV 
color space included within it. We denote this quantization 
scheme as weighted LUV quantization. The weighting 
factor for each bin is applied to the calculation of the 
histogram intersection to measure the similarity between a 
query image and database images. This weighted histogram 
comparison algorithm is similar to those of Swain and 
Ballard [7], Strieker and Swain [6], and Funt and Finlayson 
[3] except the concept of weight. Using this weighting 
scheme, we remove a number of bins that are almost useless 
for discriminating images in the view of perceptivity. We 
show the plausible performance of our algorithm for image 
indexing through experiments. 

2. Image indexing 

In this section, we review the histogram comparisons 
and the quantization of the selected color space for image 
indexing. 

2.1. Histogram-based indexing 

The advantage of using color histograms is their 
robustness with respect to geometric changes of projected 
objects. Histograms are invariant to translation and rotation 
around the viewing axis and vary slowly with changes of 
view angle, scale, and occlusion/ The colors in an image 
are mapped into a discrete color space containing n colors. 
A histogram of image / is an ?i-dimensional vector, where 
each element represents the number of pixels of color j in 
image /. Each element of a histogram is normalized so that 
the histogram represents the image without regard to the 
image size. The element of the normalized histogram H (I) 



is defined as 



3. Color spaces 



n 

Hj{i) = H j (D/ y £A j (i), (i) 

where Hj{I) is the number of pixels of color j in image 
/. We denote d(H(I).H(I')) as the distance between two 
histograms. 

Swain and Ballard [7] introduced a histogram matching 
method called histogram intersection. Given a pair of 
histograms, H{I) and H(I') y of image / and image I\ 
respectively, each containing n bins, they denned the 
histogram intersection as follows: 

H{I) n H(i f ) = =n=± — „ . " jK . (2) 

The denominator normalizes the histogram intersection and 
makes the value of the histogram intersection between 0 and 
1 . The measure H(I) 0 H(I') is analogous to the L \ -norm, 

dLAHVYH^n^YsW^I) -Hj(I')\; (3) 

as defined by Swain and Ballard [7], Strieker and Swain 
[6], and Funt and Finlayson [3]. For a given distance T, 
two histograms are said to be similar if their distance is less 
than or equal to 7*, and an image in a database is retrieved 
in response to the given query image. 

2.2. Quantization 

The histogram dimension (the number of histogram bins) 
n is determined by a color representation scheme and 
quantization level. Most color spaces represent a color as 
a 3D vector with real values (e.g. RGB, CIE XYZ, HSV, 
CIELUV). We quantize the color space of three axes into k 
bins for the first axis, / bins for the second axis and rti bins 
for the third axis. The histogram can be represented as an 
?i-dimensional vector where n — k x / x m. 

In general high resolution schemes, the histogram of 
RGB color space with [255, 255, 255] range of three axes is 
represented as a 2 21 -dimensional vector, HSV color space 
with [360, 100, 100] range of three axes as a 3600000- 
dimensional vector, CIELUV space with [100,354,202] 
range of three axes as a 9274800-dimensional vector. These 
high resolution representations, however, are unnecessarily 
large for image indexing. Because the retrieval performance 
is saturated when the number of bins is increased beyond 
some value, normalized color histogram difference was 
a satisfactory measure of frame dissimilarity, even when 
colors were quantized into only 128 bins (8 green by 8 red 
by 4 blue) [4,9]. 



The word "color" may be interpreted in several ways: a 
certain kind of light, its effect on the human eye, or most 
important of all, the result of this effect in the mind of 
the viewer [2]. Color is the perceptual result of visible 
light, which lies within the range of approximately 380nm 
- 750nm of the spectrum wavelength, incident upon the 
retina. 

3.1. CIE XYZ color space 

The assumption that there are three types of cone 
receptors in the retina is widely accepted, so three 
components are necessary and sufficient to describe a 
color. Accordingly, the set of all perceivable colors can be 
represented within a three-dimensional space. 

The axes of a color space, called primary colors, can 
be chosen arbitrarily. A convenient set, universally used 
for color measurement, is the CIE 1931 (X,Y,Z)-system 
adopted by the Commission Internationale de I Eclairage 
(CIE) [10]. Each distinct point in the CIE XYZ space 
corresponds to a unique color perception. In this space, 
the pure color component in the absence of brightness, 
such as hue and chroma, can be represented with x and y 
chromaticity coordinates defined by 

_ A' _ Y 

X ~X + Y + Z> * V ~ -Y + r + Z" (4) 

3 2. RGB color space 

RGB color space is represented with red (R), green (G), 
and blue (B) primaries and is an additive system. An 
additive RGB system is specified by the chromaticities of its 
primaries and its white point. This system's extent (gamut) 
is given in the (x,y) chromaticity diagram [10] by a triangle 
whose vertices are the chromaticities of the primaries. 

RGB values in a particular set of primaries can be 
transformed to and from CIE XYZ by three-by -three matrix 
transform. To transform from RGB into CIE XYZ the 
following transform is used [5]: 
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Because white is normalized to unity, the middle row sums 
to unity. To recover the white point, we need to transform 
RGB = [1.1, 1] to CIE XYZ before computing x and>>. 

The RGB color space provides a simple and fast 
computation. However, it is neither the perceptually 
uniform space nor the intuitive space. 



33. HSV color space 

The hue (H), saturation (S), and value (V) system is 
based on a warped version of the RGB space and is directly 
related to intuitive color notions of hue, saturation and 
brightness. 

The conversion from the RGB color space to the HSV 
color space is done through the following equation [1]: 

if r = max 
h 9 = < 2±!^1 if s = m ax (6) 



•1+1 r-3) 

v = max 

max — mm 



if h = max 



h = A'* 00, 



(7) 



where max = MAX(r.g,b) 9 win = MIN(r,g.b) 9 
6 — max — min> and (/> . v) is the corresponding point of 
(r. g, b) in the HSV color space. For r, b € [0 - - 1], the 
conversion gives A € [0 * • • 360] and .%v €[0**1]. 

The intuitiveness of the HSV color space is very useful 
because we can quantize each axis independently. Wan and 
Kuo [9] reported that a color quantization scheme based on 
HSV color space performed much better than one based 
on RGB color space. Because HSV color space involves 
different computations around 60 degree segments of the 
hue circle, the visible discontinuities occur in the color 
space. HSV color space linearly converted from RGB color 
space is also not a perceptually uniform space. 

3.4. CIELUV color space 

Consider the distance from color C\ = (X\ , Yj , Z\ ) 
to color C\ + AC, and the distance from color 
Co = (A2,K 2 ,Z 2 ) to color C 2 + AC, where AC = 
(A A", AY', AZ). Both distances are equal to AC, yet 
in general they will not be perceived as being equal 
[1], Perceptual uniformity means that the same AC at 
two different points in the color space makes the equal 
perceivable color difference. CDE XYZ, RGB, and HSV 
color spaces do not exhibit perceptual uniformity. There are 
two perceptually uniform color spaces that are the agreed 
standards in CIE. One of the two is CIELUV color space. 
The three variables L*, a*, and v* are defined by [10]: 



110(^)^-10 if 
"903.3-t 



> 0.008856 
otherwise 



n* = 13L*(tt'-u' n ) 
v* = UL*{t/-i/ n ) ' 

where v' and , v' n are calculated from, 



(8) 



(9) 



a = 



IX 



.F^;:"^ - do) 



The tristimulus values A'„,Y r n and Z n are those of the 
illuminant, with Y',, equal to 1 . The total color difference 
^E* v between two colors in CIELUV color space is 
calculated from: 

±E; W = [(AD- + (An*) 2 + (Av;*)-]* , (1 1) 

where AZ,*, Au* and Aj;* are the difference in Z*, u* and 
j;*, between two colors, respectively. 

4. Weighted quantization in CIELUV space 

In uniform quantization, each axis of the color space 
is uniformly subdivided into a prespecified number of 
bins and each bin is the same size. The advantage 
of uniform splitting method is that it is straightforward 
and simple [9]. However, the disadvantage of uniform 
quantization for perceptually non-uniform color spaces, 
such as RGB and HSV spaces, is that they don't take into 
account the perceptual similarity between different bins. To 
overcome this perceptual dissimilarity problem, we select 
the CIELUV color space, which is the same color space that 
Taycher [8] chose for uniform quantization. 

Figure 1 shows RGB space and CIELUV color spaces 
surrounded by LUV space. LUV space is represented by 
the uniformly quantized lattices as shown in Figure 1(b) 
and (c). Each bin in the LUV space includes a different 
volume of the CIELUV color space defined by a two step 
transformation of the RGB space using Eqs.(5), (8) and (9). 

Here, we introduce the weighted quantization method 
in LUV space. In this method, each bin in LUV space 
has a different weighting factor according to the volume 
of CIELUV color space included within it We denote 
this method as weighted LUV quantization. The weighting 
factor for each bin is applied to the calculation of the 
histogram intersection in the form of 



[£(J)n £(/')] ^ 



_ E?=i Wjmm{Hj{I),Hj{I>)) 



(12) 

where Wj is the weight for the bin j in the LUV space. 
We denote this histogram intersection, with subscript m, 
^H(J) n , as the weighted histogram intersection. 

The weight for each bin is deterrnined by the following 
procedures: 

1) Transform R, G, and B values at the lattice points 
in a uniformly subdivided RGB color space with a 
prespecified number into L*, u* : and v* values of the 
CIELUV color space. 

2) Compute the probability as an estimated volume 
included in each bin: 

A"; 



.Y„+15V.+3;j. ■ 



P, 



.Y„+1-51"„+3Z. 



* = 1,2 S ..., n, 



(13) 




Figure 1. RGB space and a perspective view of the CIELUV color space surrounded by LUV space. 



where Ni is the pixel number in bin n is the total 
number of bins in the space, and P; is the probability 
that a pixel will fall in bin /'. 

3) Compute the weight, 

(0 if Pi = 0, 

otherwise < 14 > 

where IV; is the weight for bin i. The weight of a bin, which 
includes the smaller volume within it, will be the larger. 
That is, the bin including highly saturated colors will get 
the larger weight 

Weighted LUV quantization is designed to provide 
the advantage of perceptual uniformity in the histogram 
similarity comparison. Using this weighting scheme, 
bins that do not reflect the perceptual discrimination are 
eliminated, and the sensitivity of a highly saturated color 
is considered to be important by the weight 

5* Experimental results 

Experiments were carried out to present the color 
indexing performance of weighted LUV quantization. To 
compare the performance, the histogram intersection is 
calculated in the HSV space and the LUV space with the 
typical uniform quantization and weighted quantization. 
In computing the weighting factor; we have examined 
sufficient samples for the probability P t in order to be 
proportional to the volume of the CIELUV color space 
included within a bin. 

Captured frames from "Star TV* and the movie "Deep 
Impact** were used as database images. We denote the Star 



TV images as the first database set and the Deep Impact 
images as the second database set. The first database set 
has 47 images and the second database set has 63 images. 
A query image was selected from a database set and the 
histogram of the image was compared with the histograms 
of the rest of the images in the database set. 

Figure 2 shows the result when color indexing for the 
first database set is applied to the quantization scheme for 
the each space. The first two columns in Figure 2 show the 
sequence of the original images and the next three columns 
display the sequence ranked as the top ten similar images 
using three quantization schemes: uniform quantization of 
HSV space, typical uniform quantization of LUV space 
(LUV), and weighted LUV quantization (LUVw). The 
lowermost image of each quantization scheme is the query 
image followed by nine matched images in the order of 
similarity. The numbers on the bottom of each image 
are the sequence number in the original scene and the 
value of normalized similarity to the query image (italicized 
number). 

Table 1 shows the variance (VAR) of the three relevant 
images (including query image and the two top ranked 
images) and the percentile of the difference (AD(%)) 
between the average histogram similarity of the three 
relevant images and the average histogram similarity of the 
seven non-relevant images. Q. Step means the quantization 
step used. The adaptive quantization step has the 15 x 9 x 9 
step for the HSV space and the 7 x 13 x 13 step for the LUV 
space, which produces the desirable result as described by 
Wan and Kuo [9]. 

Figure 3 shows the result of color indexing for the second 
database set. The quantization step used is the Adaptive for 
HSV and LUV and the 5 x 5 x 5 step for LUVw. 



Figure 2. Comparison of the uniform 
quantization in HSV space and LUV space for 
the first database set 



Figure 3. Comparison of uniform quantization 
in the HSV space and the LUV space for the 
second database set. 



Table 1. Comparison of the variance and the 
percentile of the difference of the average 
similarity for the first database set. 



Q. Step 


Measure 


HSV 


LUVw 


LUV 


3x3x3 


VAR 


0.0006 


0.0087 


0.00013 


AD(%) 


7.60 


39.23 


6.29 


5x5x5 


VAR 


0.0016 


0.0012 


0.00052 


AD(%) 


10.27 


60.02 


5.20 


Adaptive 


VAR 


0.0024 


0.0032 


0.00068 


AD(%) 


14.64 


57.51 


11.01 



Table 2. Comparison of variance and the 
percentile of the difference of the average 
similarity for the second database set. 



Q. Step 


Measure 


HSV 


LUVw 


LUV 


5x5x5 


VAR 


0.00012 


0.0007 


0.00004 


AD(%) 


11.08 


33.80 


10.84 


Adaptive 


VAR 


0.00036 


0.0017 


0.00006 


AD(%) 


17.98 


34.69 


11.12 



Table 2 shows the variance (VAR) of the four relevant 
images and the percentile of the difference (AD(%)) of 
the average histogram similarity between the four relevant 
images and the six non-relevant images for the second 
database set. 

In all quantization steps, the uniform LUV quantization 
scheme has the least variance for the relevant images 
and the weighted LUV quantization scheme has the 
maximum difference of the average similarity between 
the relevant images and the non-relevant images. For 
image indexing, a large difference of average similarity 
and a small variance are useful characteristics for 
discriminating between relevant images and non-relevant 
images. From the viewpoint of this discussion, the weighted 
LUV quantization scheme shows plausible discrimination 
performance. Even in the low resolution quantization 
step, 5x5x5, this proposed method presents the 
best performance in discriminating characteristics, with an 
appropriately small variance. 

6. Conclusion 

To improve color indexing performance, the weighted 
LUV quantization scheme has been proposed. For given 
database sets, experimental results show that the weighted 
LUV quantization method, even for low resolution 
quantization, gives better performance than others. 

The proposed weighted LUV quantization has a 



complete three dimensional subdivision scheme. To speed 
up the processing, we can apply a pseudo three dimensional 
subdivision scheme in which three axes are quantized 
independently of each other. Another interesting issue is 
to find the method assigning the weighting factor to special 
bins for indexing images having special color, such as flesh 
tone. These remain as further investigation issues. 
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100 x 100 gray scale image with 256 gray levels. Table I shows the 
average processing time and computational complexity far Wang and 
Pavlidis' method [1] and the proposed method Z As shown in Ta- 
ble I, in case of extracting topographic features using the method Z, 
the processing time was approximately five times as fast as the Wang 
and Pavlidis* method 

TABLE I 

Average Processing Time and Computational 
Complexity for Each Method 





Anrip pconwiiia time 


ComjraUinaal complexity 


Wang and Psv&hV method 


XBOmc 


i * j x (8 -x M* + C) 




0.76 k 





tkt wiOk at is 



tktmiBbaoC 



V. Concluding Remarks 

In this paper, we proposed a new method for extracting topo- 
graphic features directly from a gray scale character image without 
calculating eigenvalues and eigenvectors of the underlying image in- 
tensity surface. 

For real world character images; the gradient magnitude would be 
rarely aero at the center of pixel. Therefore, if the Wang and Pavtidis' 
method is used for topographic feature extraction, the eigenvectors 
must be approximated mostly by two perpendicular directions, and 
the calculation to get approximated directions requires much reluc- 
tant efforts. In addition to that, by employing such an approximation 
for the eigenvectors, unnecessary ridges and peaks may be extracted 
at the places such as bend points, starting points, end points, or corre- 
sponding hillside. 

In case of extracting topographic features using the proposed 
method, the processing time was approximately five times as fast as 
the Wang and Pavlidis' method because only simple comparative op- 
erations were used, and unnecessary topographic features need not be 
extracted at bend points, starting points, and end points by taking the 
local information of gray scale character image into account in de- 
termining principal orthogonal dements. In addition to that, experi- 
mental results with Levi and Montanari's data revealed that the pro- 
posed method was also very effective for gray scale skeletonization 
for character recognition. 

Acknowledgments 

The authors wish to thank the anonymous reviewers for their 
helpful comments in improving the earlier draft of mis paper. This re- 
search was supported by die 1992 Directed Basic Research Fund of 
Korea Science and Engineering Foundation. 

References 

[1] L Wang and T. Pavlkfo, Direct gray scale extraction of features for 

character recognition,** IEEE Trans. Pattern Analysis and Machine irv 

tetligcnce, vol IS, do, 10, pp. 1.053-1,067, Oct 1993. 
P] T. Pavtidis and Wotbeg, "An algorithm for the segmentation uf btlevcl 

images." Proc IEEE Con/ Computer Vision and Pattern Recognition, 

pp. 570-575, Miami, June 1986. 
PI H. Murakami and B.V.K.V. Kumar, "Efficient calculation of primary 

images from a set of images/* IEEE Trans. Pattern Analysis and Ma- 

chine Intelligence, *oL 4. pp. 511-515, 1982. 
(4J CM. Leung. "A practical basis set for Chinese character recognition,'' 

Proc. IEEE Conf. Computer Vision and Pattern Recognition* pp. 532- 

537, San Fraud sco, June 19X5. 



[5] G. Levi and U. Montanari, "A gray-weighted skeleton," Information 
and Control, vol. 1 7. pp. 62-91. 1970. 

[6] S.W. Lam, A.C Girtrttin, and S.N. Srihari, "Cray scale character rec- 
ognition using boundary features, 1 ' Proc. SPIE Con/. Machine vision 
Applications tn Character Recognition and Industrial Inspection, 
vol 1,661. pp. 98-105, Sao Jose, 1992. 

[7] T. Pavlidis, "Recognition of printed text under realistic conditions." 
Proc. Second IPTP Con/ , pp. 65-76, Tokyo. Jan. 1992. 

[Sj T. Pavlidis, L Wang, J. Zhou, W J. Sakoda, and J. Rocha, *1lecognujon 
of poorly printed text by direct extraction of features from gray scale," 
Proc. SPIE Con/ Machine Vision Applications tn Character Recogni- 
tion and Industrial Inspection, vol. 1,661, pp. 1-9, San Jose, Feb. 1992. 

[9] L. Wang and T. Pavlidis, "A geometric a pp r o a ch to machine-printed 
character recognition.'' Proc IEEE Con/ Computer Vision and Pattern 
Recognition, vol 2, pp. 665-668, Champaign, ID., June 1992. 

{10] R.M. H&ralick, LT. Watson, and T.J. Laffiy, Toe topographic primal 
sketch," Int 'I J. Robotics Research, vol 2, pp. 50-72, 1983. 

[1 1] D.H. Kim, Y.S. Hwang. ST. Park. EJ. Kim, S.H. Pack, and S.Y. Bang, 
'Hatxtwrittsn Korean character image database PE92," Proc Second 
bxt'lCon/ Document Analysis and Recognition, pp. 470-473. Tsukuba 
Science City, Japan, Oct 1993. 

f 12] L Lam, S.-W. Lee, and C.Y. Suen, Thinning methodologies— a com- 
prehensive survey,- IEEE Trans. Pattern Analysis and Machine Intelli- 
gence, vol. 14, no. 9, pp. 869-885, Sept 1992. 

(13] S.-W. Lee, L Lam. and C.Y. Suen, "A systematic evaluation of skele- 
tonization algorithms," Pattern Recognition and Artificial Intelligence, 
vol. 7. no. 5, pp. U03-1 ,225, 1993. 



Efficient Color Histogram 
Indexing for Quadratic Form 
Distance Functions 

James Hafiier. Harpreet S. Sawhney, Will Equitz, 
Myron Flickner. and Wayne Niblack 

Abstract— In image retrieval based oa rotor, the weighted dbtnncr be- 
tween color histograms of two images, represented as a quadratic form, 
may be defined as a mateh measure. However, mil distance measure b 
eompatatlonatry « pensive (aafvery and at beat 0(rV) in toe namber 
S of histogram bins) and it operates on high dlmeasknud features (OW). 
We propose the use of low-dtnunstoaal, staple to compute distance 
measoret between the color distributions, and ihow that these arc lower 
bounds on the histogram distance measure. Results on color Using nun 
matching m Urge Image databases show that prefUteriag with the sim- 
pler distance measures leads to significantly less time roapJexriy because 
the quadratic histogram distance is mow computed oa a smaller set of 
images. The low-dimensional distance measure can also be used for In- 
dexing Into the database. 

Index Terms— Color histogram matching, Image querying, image data- 
bases, efficient namdinmuioaalal feature matching, histogram indexing, 

I. Introduction 

In the query by image content (or QBIC) project, we are develop- 
ing a system for efficient indexing and retrieval of images from a 
large database based on their content defined in terms of shapes, col- 
ors, textures, and user sketches [16]. Other eflbrts towards similar 
goals are presented in (2], [7J, [8], [I I J, [12], [19]. 
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In this paper we present efficient methods for retrieving images 
based on their color content Color histograms are a way to represent 
the distribution of colon in images where each histogram bin repre- 
sents a color in a suitable color space (RGB. Lab, etc.; see, for ex- 
ample, [17]). A distance or more precisely a pseudometric or pseudo- 
norm, usually represented by a quadratic form, between a query im- 
age histogram and a data image histogram can be used to define the 
similarity match between the two distributions. However, because the 
histogram is typically a high-dimensional distribution 
(iV - 256 or 64 colors, for instance), the distance measure is compu- 
tationally expensive. A naive implementation of this as a quadratic 
form requires 0(A^) operations, though this can be improved to 0(W) 
by some r^ecornputations (eg., by diagonalizing the quadratic form). 
Also, indexing on such high-dimensional features is typically not 
feasible. Moreover for targe image databases, it is generally not fea- 
sible to compute the match measure against every image (0(M) com- 
putation if Af is the size of the database). One has to generate tow- 
dimensional indices so that using standard database indexing meth- 
ods (see for example [1]) retrioed involves only 0(log Kf) compari- 
sons. This is referred to as the dimensionality curse in multtdimen- 
Monalal indexing in large databases [1 ]. Even efficient data structures 
for database indexing, like A'-trecs [18], work well for only up to 20 
dimensions. Furthermore; the choice of low-dimensional features 
should satisfy the completeness [1] property. That is, the features 
should not lead to any raise misses. Efficiency should not be at the 
expense of correctness. 

We propose the use of low-dimensional, simple to compute dis- 
tance measures between the color distributions, and show that these 
are lover bounds on the histogram distance measure in certain fairly 
general cases. Thus, similarity retrieval based on the cheaper measure 
achieves both the goals of low-dimensionality and completeness. Our 
results on color histogram matching in large image databases show 
that prefilterrng with simpler distance measures leads to a consider- 
able time saving because the quadratic form is now computed on a 
smaller set of images. Furthermore, the methodology is applicable to 
many other situations which involve computation of a quadratic form 
distance measure between two distributions. 

II. The Problem 

Let x and y be two .V-dimensional distributions, color histograms 
for instance. For retrieval based on similarity of the two distributions, 
a distance (defined here to be a pseudometric or pseudonorm) be- 
tween the two can be defined as a match measure. A weighted form 
of such a distance measure can be represented as a quadratic form: 

^(^y)-(i-y) f A(*-y). 0> 

where A = [a a ] is a matrix and the weights a$ denote similarity be- 
tween bins i and j. These weights can be normalized so that 0 £ a H 
£ I, with a* = I, and large ^denoting similarity between bins/ and./, 
and small ay denoting dissimilarity. The two distributions can also be 
normalized so that 0 < ^ >\ £ I and T , = y^y , = 1 . It is to be 
emphasized here that </£*can be a positive senridefinite form even 
when A Is an indefinite matrix because of the normalization con- 
straints on the distributions. For now h is assumed that the above 
equations indeed define a positive semi definite form. It will be shown 
later that under certain conditions on the o^s, it always has this prop- 
erty. Besides, in any specific application or choice of matrix A, this 
can be easily tested numerically 

If the database records arc images, and the histograms are the 
color distributions in the images, then <4« represents a generalized 



quadratic histogram distance measure for color matching. Each bin or 
dimension of the histogram corresponds to a particular color in some 
chosen color space. Typically, 236 colors arc adequate to capture the 
color distributions of most natural scenes. In contrast with direct 
Euclidean distance, the general quadratic form allows for similarity 
matching between different colors (rep r esented by the histogram 
bins). (See the example in the next section.) Each entry ap in the 
"similarity matrix" A attempts to capture the perceptual similarity 
between the colors represented by bins i and j. This method of com- 
paring histograms is more sophisticated than current popular methods 
such as that of Swain and Ballard [20], and, based on our experience 
with the QBIC system, more closely corresponds to human judgment 
of color similarity. 

However, as has already been noted, the histogram quadratic form 
measure is computationally intensive and it operates on high dimen- 
sional features. The problem at hand is to define a considerably less 
expensive measure on considerably lower dimensional features so 
that: 

1) the cheaper distance measure can be used to filter a large frac- 
tion of the database without any misses, 

2) the expensive match measure computation can be limited to the 
small set of images retrieved, and 

3) the database can be organized in terms of the low dimensional 
indices. 

We show that, under fairly general conditions, the cheaper dis- 
tance, call it d k for now, can be bounded by in the form Xd k £ 
*4«i> for some positive constant A. Thus, in order to retrieve images 
satisfying £ e. the inequality £ dk can be used to retrieve im- 
ages quickly and without misses. The expensive measure will 
then have to be applied only to the filtered set of images. 

The application domain is discrete distribution matching, where 
we use additional information about the definitions of the distribution 
bins. Specifically, the similarity matrix A is used to construct a 
cheaper distance measure. 

If the similarity information is ignored, then the problem is a 
straightforward distribution matching problem. Swain and Ballard 
[20] treat their color matching problem in this way, and use L x dis- 
tance as a measure of the distance between two histograms. Other 
obvious candidates arc the "relative entropy*' or "Kullback-Leibler' 1 
distance [3] or the distance implied by the standard chi-squared sta- 
tistical test [IS], loka [10] used the histogram quadratic form dis- 
tance, but did not use the cheaper distance as a lower bound to effi- 
ciently filter out unwanted records. 

III. Histogram Distance 

Given the A^dimenskmal normalized distributions x and y, if 

z = x-y,thcn-l <*z,<> l,X,r, = 0, and<&,(x,y) = * r Az. We call 
the set of such zs the histogram space (though it is not technically a 
linear space). 

Without loss of generality, A can be assumed to be a symmetric 
matrix because the amtsymmetric part does not contribute to the 
quadratic form. 

An example will illustrate why similarity weighted comparison 
between color histograms leads to perceptually desirable results in 
contrast with comparison based on direct Euclidean distance between 
the distributions. For simplicity, consider a histogram distribution of 
three colors, say red, orange, and blue, with 
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1.0 
0.9 
0.0 



0.9 
1.0 
0.0 



where red and orange are considered highly similar. Consider a pure red 
image, x = [I A 0.0, 0.0] T , and a pure orange ima^ y = [0.0, 1.0, 0.0] r . 
The (squared) histogram distance of (1) is 0.2. This low distance re- 
flects the perceptual sizslarfty of the two images although their dis- 
tributions populate distinct bins of (be histogram so that their squared 
Euclidean distance is 2.0. 

It is now shown that for certain choices of A, d^ is indeed non- 
negative on the histogram space. 

First note that a sufficient condition for djjj^ to be no n negative is 
that the matrix A be positive scmidefinhe (PSD). But this assumption 

is not necessary because of the condition X, zi - 0. In fact in our 
experiments the As chosen are not PSD (or PD). 

Next, it is shown in |5| thai a quadratic form z 7 Hz. H = |^], on 

the subspace X* = 0 is negative semidefioite (z r Hz £ 0) (/each h i} 
represents the distance between some points P, and Pj in some finite 
dimensional L, or normed space. In particular, we have 1) = 0, 
2) hrj =» hp, and 3) 5 + h# This holds for arbitrary S T ( the di- 
mensionality of H) and is also independent of the dimensionality of 
the space over which the points P f are defined. 

Now, let di} be the Euclidean {L^ distance between colors i and j 
in some color space, for instance, (R{cd). G(recn), B(lue)], or [L, u, 
v] or the Munsell color space. Therefore 4, satisfy the requirements 
(stated above) for hg. 

We show that for one reasonable choice of in terms of d^ 0%* 
indeed is non negative. Let d^ = max,/^) and 

O^O )- (2) 



Then d^ 
have d£ n 



* r Ax = J^^zj (1 - d q jd^). Given that X# r> - 0, we 

-X/^/ v o« = -OA4*»* r H*, where H = 

Using the result stated above, x^Hx £ 0 so that d^ Z 0. 
Alternatively, another choice for a i is 

^ = exp(-<r(^/dU,>^ (3) 
for some positive constant a These a^s enforce a raster roll-off as a 
function of a\j. Clearly, for a sufficiently large, each of these matrices 
is PD since it becomes diagonally dominant For the specific values 
d^ and a we have used in practice, these matrices have not always 
been PD, but have had the property that the induced quadratic form 
is nonnegative. This was verified numerically in our examples 
but is probably true in general though we do not have a proof of this. 

Other choices of similarity metrics thai capture perceptual similar- 
ity of colors and make d^ m >0 are valid also. We have used the 
forms (2) and (3) in our experiments on real images. 

Now. d^ can be written in a different form by eliminating the 

constraint X> *i ~ 0 



where z = [z, ••• 2*.,] and A has been decomposed into its top left 
(N - I ) x (jV - I) component and the rest (a #Ar being the Mh column 



of A less the last entry a**). Applying - 0 to this, wc get 

-fin B ^K l -a tr l r -l i[ v + a w i r]? (4) 

which is written as d^ a = z T Az = z r Az\ where 

A = [A tV _, - i #v T - 1 + aKS i \ r ] (5) 

is an (AT- 1) x (N - I) modified similarity matrix, and 1 is a vector of 
A* - I ones. 

The only constraint on ? is that it lie within the potytope defined 
by J^l^l £ 1- (For instance, in two dimensions, this defines the unit 
"diamond" polygon.) Thus, now only a constraint on the Z,,-norm of 
zs is left. Therefore, with the results on the positiveness of d^ shown 
earlier, (4) implies that 

1) dl n >0i/attdonfyi/A is PSD. and 

2) for the choice of a# in (2), A is PSD. 

As already mentioned, the choice in (3) resulted in the corresponding 
A being PSD. in all our examples. 

We will call histograms x and y (respectively, the corresponding 
vector z) normalized because (respectively, ^Jz,\£\\ 

and reduced since they are dimension N - 1 . 

IV. AVERAGE COLOR DISTANCE 

In this section, we present a particularly intuitive distance meas- 
ure, called the average color distance (the distance between average 
colors), as a first instance of a cheaper measure that satisfies the re- 
quirements outlined in Section II. Subsequently, this distance meas- 
ure wilt be generalized to a series of low-dimensional distance meas- 
ures, each presenting a trade-off between their dimensionality and the 
number of false hits. 

Given that each bin of a color histogram represents a three- 
dimensional color vector in a suitably defined color space, the aver- 
age color of an image histogram is defined to be the weighted aver- 
age color corresponding to the normalized color histogram distribu- 
tion. Specifically, let C = [e, c 2 t v ] be a ) x N matrix whose ith 
column is the color c, = [a, $ )jj 7 » where a, /J, and /represent the 
magnitudes along the three color dimensions (R, G, B, for instance). 
Given two A'-dimertsional color histograms, % and y, the 3 x I aver- 
age color vector for each is 

The squared average color distance is defined by 

4, - K, - y-j'Ow - y~*) - * r c r c* . 

With W - C r C <£„ =z r Wi = i r Wx, where i is as defined in 
Section III and W is defined in terms of W similar to A of (5)- 
Clearly by construction W and W are PSD (but nvt PD since their 
rank is at most three). 

Note that d^ is defined over 3-dimenskmal features, x^ and 
in contrast with A'-dimensionaJ (typically 256 or 64) features for 
Also, the average color can be p recomputed (at database popula- 
tion/compilation time) and then it can be organized into an indexable 
(AT -tree like) structure. Furthermore, can be bounded from below 
by a simple function of d^ which ensures that indexing on d^ will 
be without any false misses. We show this in the next section. 
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A. Bound Between d^ and d^ 

If A is PD, then is just a norm on the histogram space, which 
is a subset of a linear space. The matrix C maps this vector space into 
Euclidean 3-space with the standard norm. Thus, is just the 
Euclidean norm on the image of the histogram space under the linear 
map induced by C. In this context, the induced linear map is a 
bounded linear operator. Consequently, 

where IJCj is the norm of this linear operator. In other words, the exis- 
tence of the required inequality is a consequence of a general theorem 
[6, p. 57] about finite dimensional norroed vector spaces and 
bounded linear operators. I£ as in our case, neither A nor W are PD 
but only PSD then the above results do not apply directly, though 
they can be extended to this case by restricting to the subspace on 
which A is PD. However, this does not provide us with a construc- 
tive method for computing the multiplier. In the following, we give 
an alternate proof extending the results to this case and including a 
simple method for computing the multiplier using standard results in 
constrained optimization. 

THEOREM 1. With 4»t and d^ defined as above, if A is positive 
semidefinite, then for all vectors % and y, d# a £ A,*^ , where A t 
is the minimum eigenvalue of the generalised eigenvalue problem 
Al = AWz. 

PROOF. It will be shown that z r Az > Ajz'Wz. Let 77 > 0. Because 

A is PSD there is a unique solution to the constrained minimiza- 
tion problem 

min z r Az . (6) 

This solution occurs at a value of z where the function 
z r Al-A(x r VVx-r)) has a critical value [13, Section 10.3]. 
These critical values (in the variables z and A) occur when 
Az = AWz and z r Wz = Tj t and these are regular points because 

r; > 0. Let A, £ A? £ ... £ A*. t » be the generalized eigenvalue solu- 
tions of the first equation and z,bc any generalized eigenvector 
also satisfying the second condition. Because the minimum in (6) 
exists, it must be the case that for some i„ 1 £ i £ ,V - 1, 
i r Az^2, r A^, for all z such that z r Wi = n. But the right hand 

side of this equation is just A,tj. Hence this ntmimum occurs for 
1 = I, the minimum eigenvalue. 
We have shown that 

z r Az2A l 7j = A l i r Wz\ 

for I such that z r Wz = n, with 77 > 0 being completely arbi- 
trary. If z T Wz = 0, then this inequality holds a pnon because A 
is PSD. Consequently, d^ 2 A,^, as claimed 

Note that A, docs not depend on the data, that is the image histo- 
grams, but only on the similarity matrix A and cm the definitions of 
the colors in the matrix C. to our experiments, we used the 1MSL 
routine (9, p. 4541, DGYLRG. to compute A,. This routine does not 
require W or A to be positive definite. 

Because bom A and W are PSD. A, 2 0. It is indeed possible that 
A, = 0, even though this is of no use to us tn the application. This will 
occur when the null space of A is not contained in the null space of 
W. This case is ignored throughout the rest of the paper. Further- 
more, for A! > 0, C (and so W l could be normalized so that A, = 1. 



As mentioned before, since d^ £ A,^, for any range query of 
the form d^ £ £, images retrieved using the filter d^ Sc/^A^are 
guaranteed to be a super scl of the complete target set If the filtering 
returns a relatively small fraction of the entire Hafofrayf. then the ex- 
pensive 4st needs to be computed only for that small fraction and not 
for the whole database. 

Note that if the 3-dimensional average color for each image, (Cx), 
is precomputed and stored, then at query time the average color dis- 
tance requires only three multiplications and subtractions, and two 
additions, Le., is independent of A r , the number of histogram bins. 

V. Experimental Results wrni 

The quadratic distance, 4*, has been in use in the QBIC system 
[16] for retrieval of images based on similarity of color histograms. 
Experiments with a variety of color spaces found that the best per- 
formance (using human judgment of the similarity between the query 
and database images) was obtained using a variant of the Munsell 
color space. This is a 3-dimensional space with the property that the 
Euclidean distance between two color vectors corresponds over a 
large part of the space to the perceptual difference between the col- 
ors. A transformation from the (K G t B) space, in which the original 
images are defined, to the Munsell space, described in |14], was 
used. In the experiments reported here, 256 dimensional histograms 
were used. 

We performed simulations to evaluate the effectiveness of filtering 
with <4vg on a database of 917 assorted natural images in the QBIC 
database. The relative retrieval efficiency between two methods is re- 
ported: 1) simple sequential evaluation of d^ for all database vec- 
tors, (referred to as ' naive' % and 2) filtering using d^ followed by 
evaluation of cfcu only on those records that pass through the filter 
(referred to as filtered'). Indexing methods (such as /?*-trees) were 
not used because the focus was on the gains from the filtering step. 
The sample queries involved matching each histogram record against 
the remaining records. 

In the ideal case, the filtering step would fitter out only exactly 
those records for which 4»t was in the desired range. It is guaranteed 
that filtering will include all records for which this is true, but some 
"false alarms'* will also be retrieved. The color similarity matrix A 
was defined with a# as in (2). Thus, A is guaranteed to be nonnega- 
tive definite. 




Fig. I. <&bt vs. 4m» retrieval. 
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Fig. 1 shows the extent of the false alarms generated by d^ for 
the test database . For instance, if the tolerance is set so that the top 
10% of the database would have been retrieved by direct search using 
d^b then the filtering with d^ s retrieved approximately 40% of the 
database. Even this filtering results in a considerable saving because 
4ut is applied only to the filtered 40% of the records and not to the 
complete 100%. 

VI. Generalizing Cheap Similarity Measures 

It should be evident from the discussion in Section IV that d^ is 
not the only distance that leads to low-dimensionality features and 
filtering of the database without any misses. It just happens to be one 
that results in an intuitively appealing feature, the average color. No 
particular assumptions were made on the matrix C. Therefore, the 
question to ask is if the notion of a cheaper distance measure can be 
generalized. 

In this section, we pose the question that given a particular num- 
ber, k. of feature dimensions, what is the optimum fc-dimcnsional in- 
dex and the corresponding distance? A requirement is that the dis- 
tance measure should be based on the precompiled ^-dimensional 
features like the average color. Similar to the definition of average 
color, let x, = U A x be a ^-dimensional feature defined for the 
(A' - l)-dimensiunal normalized and reduced histogram x , where U| 
is a A x (N - I) matrix. The Jb-distance (like d^) between two histo- 
grams x and y is determined by: 

dl = d\ (x, y") = (x 4 - y| ) r (x, -y,) = ?Vlv>z = z T A k i , 

where A, = fj[ll i is an at most *-rank PSD matrix. 

The problem of determining the optimal <4 for a fixed k can be 
formalized as finding the rank k matrix A* at which the cxtrcmum 

inf sup z t (a -A At (7) 

A* » ' 

is attained, subject to 

1) ( A - A, ) being PSD, and 

In words, we want to find that taank approximation to A such that 
the maximum difference (over z ) between the true distance and the 
tower dimensional distance is a mmimnrn for the given k. Also, be- 
cause we want d** £ d k to avoid false misses (with X * IX (A- A k ) 
should be PSD. 

The Singular Value Decomposition (SVD) (see, for example, [4]) 
of A provides a constructive answer to the above problem. Let 
A = V r ZV be the SVD of A , where V is an orthogonal matrix and 

L=diag(trv oV-i) with c*j * - - £ o»i -^oV-i^O. Then, a so- 
lution to the problem in (7) is given by 

A^V/X^V, (8) 

where 2^ is the k x k diagonal matrix diag(a,. <%X and V* is the 
matrix whose rows are the first k rows of V. Lc, corresponding to the 
first * singular values. 

To see this, suppose for a moment that the singular values Cp 
7=1 yV- I are not ordered. The results in [4] show that an opti- 
mal A-rank approximation A* to a matrix A must have the property 
that VAjV r is diagonal and have entries which came from the di- 
agonal matrix Is i.e.. the singular values of A . Consemientry, the cx- 
trcmum in (7) can only be attained if A* has the form given in equa- 
tion (8). Now observe that under this assumption 



it, = i r (A-A^ = (vi) r (z-i;Xvz). 

where 

From this we easily see that (A - A 4 ) is PSD. Furthermore, the level 
set 4sff = 1 defines a hypcrelHpsoid in N - 1 - k dimensions whose 
principal axes are the square-roots of the reciprocals of cvu . . ., cfo_j 
(without loss of generality, we assume that these sigmas are non-zero 
else the subspacc can be further reduced in dimension). The con- 
straint defines a polytopc with comers on each of the z k 
axes. The maximum of over such z will lie on one of the cor- 
ners. The value of the maximum will be determined by the "naflcst 
principal axis of the ellipsoid, namely min 4+ls;sA .,{l/^a7}. ^ 

minimum, over alt possible orderings of the cr*s, of this maximum 
occurs when the smallest axis is largest. That is, the desired extre- 
mura occurs when a, £ * £ o* £ • • S o&m, as claimed above. Thus, 
A k of (8) is indeed the solution to (7). 

A. it-dimensional Index and d k 

The above result creates a method for constructing a 
* -dimensional index or feature (analogous to the 3-dtmensional aver- 
age color features) for similarity search on y-dimensional histograms. 
First, off-line the SVD decomposition of the similar try matrix is cal- 
culated and a small integer k is chosen. (For best results, the choice 
of k should depend on the distribution of the significant singular val- 
ues of the similarity matrix.) This determines the matrices V, I and V k 
and I*. During database population time, an image is processed and 
the following data is collected: 1) the color histogram with N buckets, 
2) the jV-dimensional feature vector obtained by multiplying the his- 
togram by V y 3) and the Jb-dimensional feature vector obtained by 
multiplying by V k (more precisely by tnincating the N dimensional 
feature vector). The A-dimenskmal features are organized in the data- 
base for search matching on this low dimensional feature and the 
jV-dimenstonal features are stored along with them. 

At matching time, the query histogram specification is converted 
to both the jV and ^-dimensional features just as above. For a range 
query of the type we are dealing with, matching is done first on the k- 
dimensional features. This requires at most 0(k) steps per image or 
database record. If a database record passes the filter step on this 
A-dimensiona) matching, then and only then is the full matching done 
on the N dimensional data. 

There are two points here. First, we require at most 0(k) opera- 
tions per d ata b as e record to perform the filter step. This is then fol- 
lowed by the O(A0 operations to perform the final matching, but only 
on the small set of filtered images. Second, by structuring the data- 
base on the ^-dimensional features in an efficient manner (/{'-tree, for 
example) the JUtimensUmal matching need not be perfor me d on 
every element in the database. This extra efficiency is not possible on 
high dimensional features, 

Furthermore, in a targe database, the cost of retrieving large rec- 
ords can overshadow even the cost of the full distance computation. 
On the other hand, for low dimensional searches, even sequential 
search accesses small records. Consequently, this filtering method 
can reduce computation costs both in disk access and distance com- 
putation. The only assumption here is that A has only a few signifi- 
cant singular values, so that the number of false alarms will be rela- 
tively small. As is shown in the next section, this indeed is the case in 
the A s that we use in our experiments. 
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Vll. Experimental Results with <k 

In this section we present experimental results for the selectivity 
of the filtering step using d k for various values of k. The range of sin- 
gular values of the matrix A which we used is 150.78 to 0.03; its 
condition number is 5,026. The trend of the variation in singular val- 
ues is shown as a plot in Fig. 2. The first 1 2 singular values (in de- 
scending order) of the 255 x 255 matrix range from 150.78 to 1.01. 
All other singular values are less than 1 .0. 




10 15 20 
Singular Vahjc Number 



0 5 

Fig. Z First few significant singular values of A 



Figs. 3-5 show the retrieval efficiency achieved using the largest 
3, 9, and 15 singular values, respectively, to synthesize A k . The fig- 
ures clearly show the rapid gain in efficiency of filtering with d k as 
the dimension of the A-index increases. For instance, for a target re- 
trieval of 10% of the images using 4*d the fraction of images re- 
trieved using d k is about 40% for k = 3, 18% for * = 9. and 15% for 
k = 15. The efficiency with k - 3 is almost identical to that obtained 
with (there is probably a reason for this hut we have not investi- 
gated it in depth). But with a small increase m the dimensionality to 
9, the number of false retrievals rails from 30% to 8%. The marginal 
efficiency increase tapers off with further increase in k. 



3 
1 





0 0.2 0.4 0.6 0.8 

Fraction retrieved using dJust 

Fig. 4. 4* vs. dk retrieval for * = 9. 

Fig. 6 shows the ratio of retrieval using d k to retrieval using d^, 
for 10% retrieval with 4to as a function oft The asymptotic behav- 
ior of the ratio (efficiency) is evident from this figure. 

In order to illustrate the efficiency gained in CPU time only, Fig. 7 
shows the relative CPU times (time taken for distance computation 
only), for up to 10% retrieval, between d^ computation over the en- 
tire database, and d^ computation over only the part of the database 
filtered using rf lvg . Similarly. Fig. 8. shows the CPC times for filter- 
ing with d k with k = 15. The CPU time for a 256-bin d^ was about 
Imj. on an RS6000/350. It is dear from the comparisons that even 
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Fig 5.£4i« vs. dk retrieval for A« 15. 
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Fig. 6. Retrieval ratio with <& to vs. *( 10% of database retrieved by dU). 



1000 
900 
800 
700 
600 
500 
400 
300 
200 
100 
0 



CTU time with fil 



CPU time for naive 



0 0.01 0.02 0,03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 
Fraction of database retrieved 

Fig. 8. CPU time for naive retrieval vs. filtered retrieval with d* for k = 15. 
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Fig. 7 CPU time for naive retrieval vs. filtered retrieval with d^ 

results in almost l/3rd the CPU time, and <4 (k - 15) results in about 
I /6th the CPU time of naive retrieval. It is assumed in this com- 
parison that both for direct 4d* querying and filtered querying, all the 
database records are read into memory before the distance computa- 
tions Given that the focus of this paper is oo the filtering efficiency, 
a detailed timing comparison with realistic models and simulations of 
efficient indexing, combined disk access and CPU times has been left 
for another time. 



VILL Conclusions 

We presented a method for efficient retrieval of images based on 
similarity of color histograms. Clearly, the method may also be useful 
for any other application involving a quadratic distance measure. The 
results indicate that the simple filtering method described here easily 
outperforms the naive histogram comparison method. Using low- 
dimensional indexing techniques (e.g., Jt-index) would certainly make 
our method more attractive for very large database applications. 

In the future, we plan to understand analytically the relationship 
between the ratio of retrievals using d k to that using d^. Such an 
analysis would require assumptions on the distributions of the histo- 
grams themselves, for instance, that they are drawn from a uniform 
distribution. We would also like to verify our conjecture that 4* is 
indeed nonnegative for the choice of ay in (3). Finally, wc will build 
an indexable color image database using the i-index and demonstrate 
the extra efficiency in retrieva) times. 
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Abstract 

In image retrieval baaed on color, the weighted distance 
between color histograms of two '»«y t represented as a 
qmadratic form, may be defined as a match measure. How- 
ever, this di st ancr measure is computationally expensive 
(naively 0(Af*) and at best 0(N) in the number N of his- 
togram bams) and it operates on high dmiensioaal features 
(0{lf)). We propose the use of low-dimensional, simple to 
compute distance measni e s b e twe en the color distributions, 
and show that these are lower bounds on the histogram 
distance measure. Results on color histogram matching in 
large image da t a b as es show that pre-filterimg with the sim- 
pler dist an ce measures leads to significantly less time com- 
plexity because the qmadratic histogram distant* is now 
computed on a smaller set of images. Hie lowniimcnsional 
di st an ce measure can also be used tor into the 

database. 

1. INTRODUCTION 

In the query by image content (or QBIC) project, we are 
developing a system for » rg, and retrieval of 

images from a large database based on then content denned 
in terms of shapes, colors, textures and user «W**-f p2]. 

In this paper we present efficient methods for let ri e f ia g 
images based on their color content. A distance or mare pre~ 
dsety a pseudo-metric or pseudo-norm, usually i r y i es T ut ed 
by a quadra tic form, bet w een a query image color histogram 
and a data image histogram can be used to define the sim- 
ilarity match b et w een the two color distributions. How- 
ever, because the histogram is typically a higb-tfmemsional 
distribution (N = 256 or 64 colors, far instance), the dis- 
tance measure is computationally expensive. A naive im- 
plementation of this as a quadratic form l e qiiii es 0(N 3 ) 
operations, though this can be improved to 0(N) by some 
precomputatioms. Abo, mdwrrag on such high-cumensional 
features is typicaDy not feasible. Moreover for large im- 
age databases, it is generally not feasible to compote the 
match measure against every image (0(Af) computation if 
AT is the arize of the database). One has to generate low- 
dimeiffi ion a l indices so that retrieval involves only 0(log M) 
comparisons. Even efficient data structures fox database in- 
dexing, fike iF -trees [13], work weO for only up to 20 di- 
mensions, furthermore, the choice of low-dimensional fea- 
tures should satisfy the compUtenes* [l] pr o p e rty. Thai is, 
the features should not lead to any false misses. Efficiency 
should not be at the » of correctness. 

We propose the use of few-dimensional, simple to «""*n- 
pute distance mcasuies between the color dgtribution*, and 



show that these are lower bounds on the histogram distance 
measure in certain fairly general cases. Thus, similarity 
retrieval based on the cheaper measure achieves both the 
goals of low-dimensionality and completeness. This applies 
to any ™i*fr*hing pro bl em involving quadratic form ^k^ttt 
measure between two distributions. 

2. THE PROBLEM 

Let x and y be two ^/-dimensional distributions, color his- 
tograms for instance* For retrieval based on similarity of the 
two distributions, a distance (denned here to be a pseudo- 
metric or pseudo-norm) between the two can be defined as 
a match measure. A weighted form of such a « K **tnr' t mea- 
sure can be represented as a quadratic form: 

dw.t(x, y) = (x - y)'A(x - y), (1) 
where A = [aij] is a matrix and the weights a»> (0 < 
*ij < 1, an s 1) demote similarity between bins • and jt. 
Larger the o,j the more the similarity between bins i and 
j. The two distributions can also be normalised so that 
0 < *i># < 1 and = y^fi = 1. Note that dhkt 

can be a positive semi-definite form even when A fa an in- 
definite matrix because of the normalisation constraints on 
the distributions. We assume that the above equations in- 
deed define a po siti ve semi-definite form. It cam be shown 
that under certain conditions on the it always has this 
property [5]. Besides, in amy specific application or choice 
of matrix A, this can be easily tested numerically. 

In our case dhbt represents a generalised quadratic his- 
togram distance measure for color matching Topically, 
256 colors are adequate to capture the color distributions 
of most natural scenes. In contrast with direct Euclidean 
distance, the general quadratic form allows for similarity 
matching between different colors ( re pr esen ted by the his- 
togram bins). Each entry aij in the "similarity matrix* A 
attempts to capture the perceptual similarity between the 
colors r e p r es en ted by bins i and j. This method of compar- 
ing histograms is more sophisticated than current popular 
m e thods such as that of Swain [14]. 

However, as has already been noted, the histogram 
quadratic form measure is computationally intensive and 
it operates on high dimensional features. The problem at 
hand is to define a considerably less expensive measure on 
considerably lower dimenstomal features. We show that, un- 
der fairly general conditions, the cheaper distance, call it da 
for now, can be bounded by bW in the form Ad* < d\u%, 
for some positive constant A. Thus, in order to retrieve im- 
ages satisfying du»» < e, the inequality da < c/\ can be 
used to retrieve images quickly and without misses. The 
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expensive measure dhui will then have to be applifd only 
to the filtered set of images, 

If the similarity information b ignored, then the prob- 
lem is & straightforward distribution m*t/-h ir| g problem. 
Swain [14] treats his color matching problem in this way, 
and uses Lj distance as a measure of the distance between 
two histograms. Other obvious candidates are the "relative 
entropy* or "Rollback- LeiWer 8 distance [2] or the distance 
implied by the standard chi-squared statistical test [11]. 
ioxa [8j used the histogram quadratic form distance, but 
did not use the cheaper distance as a lower bound to effi- 
ciently filter out unwanted records, 

3. HISTOGRAM DISTANCE 

Given the AT-dimensional normalised distributions x and y, 
if * = x-y, then -1 < z* < 1, £\ a = 0, and dhi»t(x,y) = 
s T Ax. We call the set of such x's the histogram space 
(though it is not technically a linear space). Without loss 
of generality, A can be assumed to be a symmetric matrix, 
An nam pip (from [4]) will illustrate why similarity 
weighted comparison between color histograms leads to per- 
ceptually desirable results in contrast with comparisons 
based on direct Euclidean distance between the distribu- 
tions. For simplicity, consider a histogram distribution of 
three colors, say red, orange and blue, with 

1.0 0.9 0.0 
0.9 1.0 0.0 
0.0 0.0 1.0 

where red and orange are considered highly similar. Con- 
sider a pure red image, x — [1.0, 0.0, 0.0] r , and a pure 
orange image, y = [0.0, 1.0, 0.0] r . The histogram dis- 
tance of equation (1) is 0.2. This low distance reflects the 
perceptual similarity of the two images although their dis- 
tributions populate distinct bins of the histogram so that 
their Euclidean distance is 1.0. 

It can be shown that Cor certain choicfw of A, d^u 
will be non-negative on the histogram space [5]. With 
dt»*s = max;j(d«j) one such choice is, <Uj = (1 — ^/(Imi). 
Other choices of similarity metrics that capture perceptual 
similarity of colors and make dhiit > 0 arc valid also. We 
have used the above value in experiments on real images. 

Now, dfcb* can be written in a different form by efim> 
nating the constraint *• = 0* 

where i T = [*i • • - *n-i] and A has been decomposed into 
its top left (VV — 1) x (N — 1) component and the rest (a»j? 
being the Nth com inn of A less the last entry art ft)- Ap- 
plying JT, *i = 0 to this, we get 

dhk. = i T [Aat-i - fUwl T - IaJat + otfivll 7 ] z (2) 
which is written as dhbt = s r Ax = s r As, where 

A = [A N -i - iu*l T - laTjff + o^ll 1 ] (3) 
is an fY — 1) x (N — 1) modified similarity matrix, and 1 
is a vector of N — 1 ones. 



4. AVERAGE COLOR DISTANCE 

We now present a particularly intuitive ^fr^inrT measure, 
called the average color distance (the distance between av- 
erage colors), as a first instance of a cheaper measure. 



Given that each bin of a color histogram represents a 
three-dim ens i o nal color vector in a suitably defined color 
space, the average color of an image histogram is defined 
to be the weighted average color corresponding to the 
normalized color histogram distribution. Specifically, let 
C = [ci cj • • ■ Cjv] be a 3 x N matrix whose ith column 
is the color c* = [a* # 7,] r , where o, ft and j represent 
the magnitudes along the three color dimensions (R,G,B, 
for instance). Given two # -dimensional color histograms, 
x and y, the 3x1 average color vector for each is 

x*„ = Cx, y.„ ff = Cy. 
The average color distance is defined by 

d»», = (x„ ff - y«») T (x„, - y m9f ) = x r C r Cx. 
With W = C T C, d„ 9 = s r Wx = i T Wi, where £ is as 
defined in Section 3 and W is defined in terms of W nimilM 
to A of equation (3). Clearly by construction W and W 
are PSD (but not PD since their rank is at most three). 

Note that d±r g is defined over ^-dimensional features, 
x^mt and y« V9 , in contrast with ^-dimensional (typically 
256 or 64) features for rfki»i. Also, the average color can 
be precomputed (at database population / compilation time) 
and then it can be organised into an indexable ( JJ*-tree like) 
structure. Furthermore, dhut can be bounded from below 
by a simple function of d^ g which ensures that nidging on 
dftvf will be without any false misses. 

4.1. Bound between dust and d** f 

If A is PD, then rfhi»* is just a norm on the histogram space, 
which is a subset of a linear space. The matrix C maps 
this vector space into Euclidean 3-space with the standard 
norm. Thus, d* vg is just the Euclidean norm on the image 
of the histogram space under the linear map induced by C. 
In this context, the induced linear map is a bounded linear 
operator. Consequently, 

<*~s < yc„ a «w, 

where [)C|| is the norm of this linear operator. In other 
words, the existence of the required inequality is a conse- 
quence of a general theorem [6, pg. 57] about finite dimen- 
sional normed vector spaces and bounded linear operators. 
If, as in our case, neither A nor W are PD but only PSD 
then the above results do not apply directly. In the fol- 
lowing we extend the result to this case and give a simple 
method for computing the multiplier in the inequality using 
standard results in constrained optimisation. 

Theorem 1 With a\u» anrfdV, g defined as above, if A is 
positive semi- definite, then for all vectors x and y, dhbt > 
A|d** c , where \\ is the minimum eigenvalue of the gener- 
alized eigenvalue problem Ax = A Wis. 

Proof: It will be shown that b t Ab > Aiz r Wa. Let 
n > 0. Because A is PSD there is a unique solution to the 
constrained minimisation problem 

min £ r A£. (4) 

• :fW««i, 1 ' 

This solution occurs at a value of i where the function 
» T Aa - A (i r Wa - n) has a critical value [9, Section 10.3]. 
These critical values (in the variables 8 and A) occur when 
Ax = A Wis and a r WS = o, and these are regular points 
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because 17 > 0. Let Ai < Aa < .. . < Xh-i, be the gener- 
alised eigenvalue solutioms of the first equation and ii be 
any generalised eigenvector also satisfying the second con- 
dition. Because the minimum in (4) exists, it matt be the 
case that for some % f 1 < t < N - 1, i T Ai > if Ai*, for 
aD ft such that i T Wi = 9. Bat the right hand side of thb 
equation is just A* v. Hence this Tnimirmm occurs for t = 1, 
the minimum eigenvalue* 
We have shown that 

for 2 such thai i T Ws = i/, with 0 > 0 being completely 
arbitrary. If s r Wi = 0, then this inequality holds a pri- 
ori because A is PSD. Consequently, aW > Airf MS , as 
claimed. O 

Note that Ai does not depend on the data, that b 
the image histograms, but only on the aimilaiity matrix 
A and on the definitions of the colon) in the matrix C. In 
our experiments, we used the IMSL routine [7, pg. 454], 
DGVLRG, to compute A] . This routine does not require 
vV or A to be positive definite. 

Because both A and W are PSD, Ai > 0. It is indeed 
possible that Ai = 0, even though this is of no use to us. 
This will occur when the null space of A is not contained 
in the null space of W. This case is ignored throughout the 
rest of the paper. Furthermore, for A* > 0, C (and so W) 
could be normalised so that Ai = 1. 

Since dbbt > Ad^g, for any range query of the form 
d±y t < e t images ic t ricve d using the filter d± Wf < r/A are 
guaranteed to be a super set of the complete target set. If 
the filtering returns a relatively small fraction of the entire 
database, then the expe ns ive du*t needs to be computed 
only for thai small fraction and not for the whole database. 

If the 3-dimensional average color foe each image, (Cx), 
is pxecomputed and stored, then ai query time the average 
color dbtancr- requires only three muKi plications and sub- 
tractions, and two additions, i.e., is independent of N, the 
number of histogram bins. 

&. EXPERIMENTAL RESULTS WITH d»* s 

The quadratic dbtaiwr, dhist , has been in use in the QBIC 
system [12] for retrieval of images based an similarity of 
color histograms. Experiments with a variety of color spaces 
found that the best performance (using humm judgement 
of the similarity bet w ucu the query and da t abas e images) 
was obtained using a variant of the MunseH color space. 
This b a 3-dimensional space with the piopeity thai the 
Euclidean distance bet w een two color vectors c orr e sponds 
over a large part of the space to the perceptual dbTe ie inr 
bet w een the colors. A transformation from the (£,&,£) 
space, in which the original images are defined, to the Mun- 
seB space, described in [10], was used. The histograms used 
are 256 dimensional. 

We performed srmnlationw to evaluate the effectiveness 
of filtering with <Lr t on a da t a b ase of 917 ■■sorted natural 
images in the QBIC database. The relative retrieval effi- 
ciency bet w een two methods is reported: (i) simple sequen- 
tial evaluation of oW, for all database vectors, (referred to 
as 'naive'), and (n) filtering using dU g followed by evalu- 
ation of dtfct only on those records that pass through the 
filter (referred to as t f%ttmd\ Tiwfr-fr^ methods (such as 
4Z*-trees) were not used became the focus was 00 the gains 



from the filtering step. The sample queries involved match- 
ing each histogram record against the remaining records. 

In the ideal case, the filtering step would filter out only 
exactly those records for which d^Lt was in the desired 
range. It b guaranteed thai filtering will include all records 
for which this b true, but some "false alarms* will also be 
retrieved. A was defined with a,,- as in section 3, guaran- 
teeing it to be noo-negative dprfmit*. 




Figure 1: dkut vs. d*» ( retrieval. 

Figure 1 shows the extent of the false alarms generated 
by d» V j for the test database. For instance, if the tolerance 
b set so that the top 10% of the database would have been 
ietib »ed by direct search using dbui , then the filtering with 
dL»j retrieved approximately 40% of the database. Even 
thb filtering results in a considerable saving because dWt 
b applied only to the filtered 40% of the records and not to 
the complete 100%. 

6. CHEAPER SIMILARITY MEASURES 

The function d»» ( b not the only distance that leads to low- 
dimensionality features and filtering of the d at a b as e with- 
out any misses. It just happens, to be one thai results in an 
intuitively appealing feature, the average color. No partic- 
ular assumptions were made on the matrix C. Therefore, 
the question to ask b if the notion of a cheaper distance 
measure can be generalised. 

Given a particular number, k, of feature dimmtrioua, 
what b the optimum ^-dimensional index and the corre- 
sponding distance? A requirement b thai the distance 
measure should be based on the precompiled fc-dfaneurional 
features fike the average color. Similar to the definition of 
average color, let x* = TJafc be a JMimenfrional feature de- 
fined for the (N — l)-dimensional normalised and reduced 
histogram x, where U* b a k x (N — 1) matrix. The k- 
dbtance (Eke i» T| ) between two histograms x and y b given 
by: 

d*=d»(x,?)=(x* - y*) T (x* -y fc )=» r Uru fc i=i r A*i, 

where A* = Uj*U*, b an ai most fc-rank PSD matrix. 

The problem of determining the optimal d* for a fixed k 
can be formalised as finding the rank k matrix A* at which 
the extiemum 

mfsups T (A- A*)i (5) 
A* s 

b attained, subject to (i) (A - A*) being PSD, and (ii) 
< 1. In words, we want to find thai i-rank approx- 
imation to A such that the maximum d iffe renc e (over i) 
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between the true distance and the lower dim »n«KwiAl dis- 
tance is a minimum for the given k. Also, because we want 
dkin > dh to avoid false misses (with X — 1), (A — A*) 
should be PSD. 

The Singular Vahse Decomposition (SVD) (see, for ex- 
ample, [3]) of A provides a constructive answer to the above 
problem. Let A = V T £V be the SVD of A, where V 
is an orthogonal matrix and £ = diag(ffi,. with 
o\ > * • * > <r* > * • * > <rj^_i > 0. Then, a solution to the 
problem in (5) is given by 

A*=VjE fc V fc( (6) 
where E* is the kxk diagonal matrix diag(<ri, . . . , <Th), and 
V* is the matrix whose rows are the first k rows of V, i.e., 
corresponding to the first k singular vahies. 

To see this, suppose for a moment that the singular 
values cj, j = 1, . . . , TV — 1 are not ordered. The results 
in [3] show that an optimal Jt-rank approximation A* to a 
matrix A must have the property that VA*V T is diagonal 
and have entries which come from the diagonal matrix E, 
Le,, the singular vahies of A. Consequently, the ext remain 
in (5) can only be attained if A* has the form given in 
equation (6). Now observe that under this assumption 

ddur = i T (A -A k )i = (Vi) T (E - 3^)(V£). 

where Ei = ^ ^J* Jj ^. From this we easily see that 

(A— A*) is PSD. Furthermore, the level set dots = 1 defines 
ahypereDipacdd in N—l—k dimensions whose principal axes 
are the square-roots of the reciprocals of <7*+i , . . . , ay-i 
(without loss of generality, we assume that these sigmas 
are non-sero else the subs pace can be further reduced in 
dimension). The constraint £ t |z;| < 1 defines a polytope 
with corners on each of the x; axes. The maximum of d&s 
over such s wiD lie on one of the corners. The value of the 
maximum will be determined by the smallest principal axis 
of the ellipsoid, namely min fc+1 < l <^_i{l/ v /'ffj}. The mini- 
mum, over ah* possible ordering* of the o\ of this maximum 
occurs when the smallest axis is largest. That is, the de- 
sired extremuxn occurs when o\ > • * * > a* > • • • > <th-\ , 
as claimed above. Thus, A* of (6) is indeed the solution 
to (5). 

6.1, fc-cUmenaional Index and <f* 

The above result creates a method for constructing a 
fc-dimcnskmal index (or feature) for sixnilarhy search of 
-dimensional histograms. In the spirit of the average 
color (a 3-dimemuonal feature or index), a k- index for an 
Jtf-drmenaional histogram, x, can be defined as: x* =3 
VC*V*X- In practice, k < N can be chosen arbitrarily 
though its choice should depend on the distribution of the 
significant singular vahies <r». For each image, x* can be 
pre-computed and stored in the database. Just as with 
average color, the low-dimensional features x* can be orga- 
nized in some database structure to optimise the search at 
query time. In any case, d* becomes a filter for rfkUt- 

Furthermore, in a large database, the cost of retriev- 
ing large records can overshadow even the cost of the full 
distance computation. On the other hand, for low di- 
mensional searches, even sequential search Mi tftw ir small 
records. Consequently, this filtering method can reduce 
computation costs both in disk access and distance com- 
putation. The only assumption here is that A has onry a 



few significant singular values, so that the number of false 
alarms will be relatively small. As is shown in the next sec- 
tion, this indeed is the case in the A*s that we use in our 
experiments. 

7. EXPERIMENTAL RESULTS WITH d k 

We present experimental results for the selectivity of the 
filtering step using dk for various values of k. The range 
of singular values of the matrix A which we used is 150.78 
to 0.03. The first 12 singular values (in descending order) 
of the 255 X 255 matrix are 150.78, 20.32, 9.81, 7.07, 3.73, 
2.45, 1.96, 1.54, 1.31, 1.15, 1.01. AH other singular values 
are less than 1.0. 




Figure 2: d^ut vs. dk retrieval for k = 3. 




Figure 3: tfVut vs. dk retrieval for k = 15. 

Figures 2-3 show the retrieval efficiency achieved us- 
ing the largest 3, and 15 singular vahies, respectively, to 
synthesize A*. The figures clearly show the rapid gain in 
efficiency of filtering with dk as the dimension of the fc-index 
increases. For instance, for a target retrieval of 10% of the 
images using rfhui , the traction of images retrieved using dk 
is about 40% for 4 = 3, and 15% for Jk = 15. The effi- 
ciency with k = 3 is almost identical to that obtained with 
d»r S (there is probably a reason for this but we have not 
investigated it in depth). But with an increase in the di- 
mensionality to 15, the number of false retrievals falls from 
30% to 8%. The marginal efficiency increase tapers off with 
further increase in k. 

Figure 4 shows the ratio of retrieval using dk to retrieval 
using diu,t, for 10% retrieval with dhbt. as a function of fc. 



69 



The asymptotic behavior of the ratio (efficiency) b evident 
from this figure. 




Figs re 4: Retrieval ratio with d* to 4*kt vs. k 
(10% of database retrieved by (fast). 




Figure 5: CPU time for naiue retrieval vs. filtered 
retrieval with fl %»g . 

In order to illustrate the efficiency gained in CPU time 
only, figure 5 shows the relative CPU tizzies (time token 
for ffataryy compatatioo only), fax up to 10% retrieval, 
bilwtoi duat computation over the entire datihisr, amd 
dW computation over only the part of the da tab as e filtered 
asbtg d** % . Similarly, figure 6, shows the CPU tunes fan 
filtering with da with k = 15. The CPU time tot a 256- 
bin oW was aboat 1ms. on an RS6000/350. It is dear 
from the comparisons that even for retrievals as high as 
10% of the d it abase, d»*f filtering results in almost l/3rd 
the CPU time, and •**(* = 15) results in about l/6th the 
CPU time of naive retrieva l. It is aswrmed in ibis 
comparison that both tor direct aWt qnerying and filtered 
querying, aD the database records are read into memory 
before the dint an re computations. Given that the tocos 
of this paper is on the filtering efficiency, a detailed timing 
comparison with realistic models and miriilatoons of efficient 
mdermg, combined disk access and CPU times has been left 
for another time. 

8. CONCLUSIONS 

We presented a m et h o d for efficient retrieval of images 
bawd on rirnitarity of color histograms, Cfeariy, the method 
may aho be nsef&l fix any other application involving 
n quadratic distance measure. The resnlts indicate that 




Figure 8: CPU time foe nam retrieval vs. filtered 
retrieval with da for * = 15. 

the simple filtering method described here easily outper- 
forms the naive histogram comparison method. Using low- 
dimenriona] indexing techniques (e.g., fc-index) would cer- 
tainly make our method more attractive for very large 
database applications. 
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Abstract 

Color histograms are widely used for content-based 
image retrieval. Their advantages are efficiency, and 
insensitivity to small changes in camera viewpoint 
However, a histogram is a coarse characterization of 
an image, and so images with very different appear- 
ances can have similar histograms. We describe a 
technique for comparing images called histogram re- 
finement, which imposes additional constraints on his- 
togram based matching. Histogram refinement splits 
the pixels in a given bucket into several classes, based 
upon some local property. Within a given bucket, only 
pixels in the same class are compared. We describe a 
split histogram called a color coherence vector (CCV), 
which partitions each histogram bucket based on spa- 
tial coherence. CCV's can be computed at over 5 im- 
ages per second on a standard workstation, A database 
with 15,000 images can be queried using CCV's in un- 
der 2 seconds. We demonstrate that histogram refine- 
ment can be used to distinguish images whose color 
histograms are indistinguishable. 

1 Introduction 

Many applications require methods for comparing 
images based on their overall appearance. Color his- 
tograms are a popular solution to this problem, and 
are used in systems like QBIC [2] and Chabot [6]. 
Color histograms are computationally efficient, and 
generally insensitive to small changes in camera po- 
sition. However, a color histogram provides only a 
very coarse characterization of an image; images with 
similar histograms can have dramatically different ap- 
pearances. For example, the images shown in figure 1 
have similar color histograms. 

In this paper we describe a method which imposes 
additional constraints on histogram baaed matching. 
In histogram refinement, the pixels within a given 
bucket are split into dasses based upon some local 
property. Split histograms are compared on a bucket 
by bucket basis, similar to standard histogram match- 
ing. Within a given bucket, only pixels with the same 
property are compared. Two images with identical 
color histograms can have different split histograms; 
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thus, split histograms create a finer distinction than 
color histograms. This is particularly important for 
large image databases, in which many images can have 
similar color histograms. 




Figure 1: Two images with similar color histograms 

We have experimented with a split histogram called 
a color coherence vector (CCV), which partitions pix- 
els based upon their spatial coherence. A coherent 
pixel is part of some sizable contiguous region, while 
an incoherent pixel is not. While the two images 
shown in figure 1 have similar color histograms, their 
CCVs are very different. 1 For example, red pixels 
appear in both images in similar quantities. In the 
left image the red pixels (from the flowers) are widely 
scattered, while in the right image the red pixels (from 
the golfer's shirt) form a single coherent region. 

We begin with a review of color histograms. In sec- 
tion 3 we describe histogram refinement, and present 
two examples that capture spatial information. Sec- 
tion 4 provides examples of refinement-based image 
queries and shows that they can give superior results 
to color histograms. We compare our Work with some 
recent algorithms [5, 8, 9, 10] that also combine spatial 
information with color histograms. 

2 Color Histograms 

Color histograms are frequently used to compare 
images. Examples of their use in multimedia appli- 
cations include scene break detection and querying a 
database of images [7, 6, 2\. Color histograms are pop- 
ular because they are trivial to compute, and tend to 

1 The color images used in this paper can be fotmd at 
http://www.c9.cornell.edn/home/rd2/refinement.l1tml. 
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be robust against small changes in camera viewpoint. 
Fbr example, Swain and Ballard [12] describe the use 
of color histograms for identifying objects. Strieker 
and Swain [11] analyze the information capacity of 
color histograms. 

We will assume that all images are scaled to con- 
tain the same number of pixels M . We discretize the 
colorspace of the image such that there are n distinct 
(discretized) colors. A color histogram H is a vector 
^2> • • - > hn)* m which each bucket hj contains the 
number of pixels of color j in the image. Typically 
images are represented in the RGB colorspace, with a 
few of the most significant bits per color channel. 

For a given image I, the color histogram Hj is a 
compact summary of the image. A database of im- 
ages can be queried to find the most similar image to 
I, and can return the image V with the most similar 
color histogram Hp, Color histograms are typically 
compared using the Zi-distance or the Lj-distance, 
although more complex measures have also been con- 
sidered (4]. 

3 Histogram Refinement 

In histogram refinement the pixels of a given bucket 
are subdivided into classes based on local features. 
There are many possible features, including texture, 
orientation, distance from the nearest edge, relative 
brightness, etc. Histogram refinement prevents pixels 
in the same bucket from matching each other if they 
do not fall into the same class. Pixels in the same class 
can be compared using any standard method for com- 
paring histogram buckets (such as the L\ distance). 
This allows fine distinctions that cannot be made with 
color histograms. 

As a simple example of histogram refinement, con- 
sider a positional refinement where each pixel in a 
given color bucket is classified as either "in the cen- 
ter" of the image, or not. Specifically, the centermost 
75% of the pixels are defined as the "center". This 
produces a split histogram in which the pixels of color 
buckets are loosely constrained by their location in 
the image. The resulting split histograms can be com- 
pared using the L\ distance. We will call this simple 
form of histogram refinement centering refinement 
Color coherence vectors 

CCV's are a more sophisticated form of histogram 
refinement, in which histogram buckets are partitioned 
based on spatial coherence. Our coherence measure 
classifies pixels as either coherent or incoherent. A 
coherent pixel is a part of a sizable contiguous region, 
while an incoherent pixel is not. A color coherence 
vector represents this classification for each color in 
the image. 

The initial stage in computing a CCV is similar to 
the computation of a color histogram. We first blur 
the image slightly by replacing pixel values with the 
average value in a small local neighborhood (currently 
including the 8 adjacent pixels). We then discretize 
the colorspace, such that there are only n distinct col- 
ors in the image. 

The next step is to classify the pixels within a given 
color bucket as either coherent or incoherent. A coher- 
ent pixel is part of a large group of pixels of the same 



color, while an incoherent pixel is not. We determine 
the pixel groups by computing connected components. 
A connected component C is a maximal set of pixels 
such that for any two pixels p t ]/ G C, there is a path 
in C between p and p . We compute connected com- 
ponents using 4-connected neighbors within a given 
discretized color bucket. We classify pixels as either 
coherent or incoherent depending on the size in pixels 
of its connected component. A pixel is coherent if the 
size of Hs connected component exceeds a fixed value 
r\ otherwise, the pixel is incoherent. 

For a given discretized color, some of the pixels 
with that color will be coherent and some will be in- 
coherent. Let us call the number of coherent pixels of 
the j'th discretized color otj and the number of inco- 
herent pixels fij. Clearly, the total number of pixels 
with that color is oij *f and so a color histogram 
would summarize an image as {ai + ft, . . . , a n + fi n ) . 
Instead, for each color we compute the pair (ctj,Pj) 
which we will call the coherence pair for the j'tn color. 
The color coherence vector for the image consists of 
((ori,ft),..,(ot„,^n)). This is a vector of coherence 
pairs, one for each discretized color. 

In our experiments, all images were scaled to con- 
tain M ~ 38,976 pixels, and we have used r — 300 
pixels (so a region is classified as coherent if its area is 
about 1% of the image). With this value of t, an av- 
erage image in our database consists of approximately 
75% coherent pixels, with a standard deviation of 11%. 

Two images I,/' can be compared using their 
CCV's, for example by using the L\ distance. Let the . 
coherence pairs for the j'th color bucket be (aj t 0j) in 
J and {aj,0j) in Using the Li distance to compare 
CCV's, the j'th bucket's contribution to the distance 
between / and /' is 

Accv = |( Qj - tt ;-)| + |(& . (i) 

Note that when using color histograms to compare J 
and the j'th bucket's contribution is 

ach = +#)|. (2) 

It follows that CCV's create a finer distinction than 
color histograms. A given color bucket j can contain 
the same number of pixels in J as in /', but these pixels 
may be entirely incoherent in I and entirely coherent 
in V (i.e., a = $' = 0). Formally, &ch < Accv 
foDows from equations 1 and 2, and the feet that the 
L\ distance obeys the triangle inequality. 

4 Experimental Results 

We have implemented histogram refinement, and 
have used it for image retrieval from a large database. 
Our database consists of 14,554 images, which are 
drawn from a variety of sources. Our largest sources 
include the 11,667 images used in Chabot [6], the 1,440 - 
images used in QBIC ft], and a 1,005 image database 
available from Corel. In addition, we included a few 
groups of images in PhotoCD format. Finally, we have 
taken a number of MPEG videos from the Web and 
segmented them into scenes. We have added one or 
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two images from each scene to the database, totaling 
349 images. The image database thus contains a wide 
variety of imagery. 

We have compared our results with a number of 
color histogram variants. These include the L\ and 
L2 distances, with both 64 and 512 color buckets. 
We include a small amount of smoothing as it em- 
pirically improved performance. On our database, the 
Li distance with the 64-bucket RGB coiorspace gave 
the best results, and is used as a benchmark. 

Hand examination of our database revealed 75 
pairs of images which contain different views of 
the same scene. Examples are shown in figures 2 
and 3. One image is selected as a query image, 
and the other represents a "correct" answer. In each 
case, we have shown where the second image ranks, 
when similarity is computed using color histograms 
or using histogram refinement. Specifically, results 
are shown using CCV's, centering refinement, and 
a successive refinement technique described in sec- 
tion 6.1. The color images shown are available at 
http://www.cs.cornell.edu/home/rdz/refinement.html 

4.1 Centering refinement results 

In 69 of the 75 cases, centering refinement pro- 
duced better results, while in 4 cases it produced worse 
results (there were 2 cases where the ranks did not 
change]. The average change in rank due to center- 
ing . refinement was an improvement of 55 positions 
(this included all 75 cases). The average percentage 
change in rank was an improvement of 41%. In the 
69 cases where centering refinement performed bet- 
ter than color histograms, the average improvement 
in rank was 60 positions, and the average percentage 
improvement was 49%. In the 4 cases where color his- 
tograms performed better than centering refinement, 
the average rank improvement was 10 positions. We 
have not yet analyzed these 4 cases to detennine why 
centering refinement fails. 

To analyze the statistical significance of this data, 
we formulate the null hypothesis H Q which states that 
centering refinement is equally likely to cause a posi- 
tive change in ranks (i.e., an improvement) or a neg- 
ative change. We will discard the 2 ties to simplify 
the analysis. Under #0, the expected number of pos- 
itive changes is 36.5, with a standard deviation of 
y/TZ/2 « 4.27. The actual number of positive changes 
is 69, which is more than 7.6 standard deviations 
greater than the number expected under H&. We can 
therefore reject Ho at any standard significance level 
(such as 99.9%). 

4.2 CCV results 

In 68 of the 75 cases, CCV's produced better re- 
sults, while in 7 cases they produced worse results. 
The average change in rank due to CCV's was an im- 
provement of 68 positions (note that this included the 
7 cases where CCV's do worse). The averagepercent- 
age change in rank was an improvement of 35%. In the 
68 cases where CCVs performed better than color his- 
tograms, the average improvement in rank was 77 po- 
sitions, and the average percentage improvement was 
56%. In the 7 cases where color histograms performed 
better, the average improvement was 17 positions. 



The null hypothesis H 0 states that. CCV's are 
equally likely to cause a positive change in ranks (i.e., 
an improvement) or a negative change. Under, ifo, 
the expected number of positive changes is 37.5, with 
a standard deviation of \/75/2 « 4.33. The actual 
number of positive changes is 68, which is more than 7 
standard deviations greater than the number expected 
under JJ 0 - We can therefore reject Ho at any standard 
significance level (such as 99.9%). 

When CCV's produced worse results, it was always 
due to a change in overall image brightness (i.e., the 
two images were almost identical, except that one was 
brighter than the other). Because CCV's use dis- 
cretized color buckets for segmentation, they are more 
sensitive to changes in overall image brightness than 
color histograms. We believe that this difficulty can 
be overcome by using a better coiorspace than RGB, 
as we discuss in section 6:2. * 

4.3 Efficiency 

We have experimented with a number of different 
techniques for histogram refinement. CCV's are the 
most computationally expensive method of these, and 
will be our focus in discussing efficiency. 

There are two phases to the computation involved 
in querying an image database. First, when an im- 
age is inserted into the database, a CCV must be 
computed. Second, when the database is queried, 
some number of the most similar images must be re- 
trieved. Most methods for content-based indexing in- 
clude these distinct phases. For both color histograms 
and CCV's, these phases can be implemented in linear 
time with a single pass over the image. 

We ran our experiments on a 50 MHz SPARCsta- 
tion 20, and provide the results from color histogram- 
ming for comparison. Color histograms can be com- 
puted at 67 images per second, while CCV's can be 
computed at 5 images per second. Using color his- 
tograms, 21,940 comparisons, can be. performed per 
second, while with CCV's 7,746 can be performed 
per second. The images used for benchmarking are 
232 x 168. Both implementations are preliminary, and 
the performance can definitely be improved. 

5 Related Work 

Our work has focused on the use of spatial infor- 
mation to refine color histograms. Recently, several 
authors have proposed algorithms for comparing im- 
ages that combine spatial information with color his- 
tograms. Hsu et al [5] attempts to capture the spatial 
arrangement of the different colors in the image, in 
order to perform more accurate content-based image 
retrieval. Rkkrnan and Stonham [8] randomly sam- 
ple the endpoints of small triangles and compare the 
distributions of these triplets. Smith and Chang [9] 
concentrate on queries that combine spatial informa- 
tion with color. Strieker and Dimai [l0] divide the 
image into five partially overlapping regions and com- 
pute the first three moments of the color distributions . 
in each image. We will discuss each. approach in turn. 

Hsu [5] begins by selecting a set of representative 
colors from the image. Next, the image is partitioned 
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Histogram: 198. Centering refinement: 42. CCV: 33. Successive refinement: 6. 
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Histogram: 78. Centering refinement: 54, CCV: 12. Successive refinement: 7. 



Histogram: 119. Centering refinement: 60. CCV: 36. Successive refinement: 25. 



•>v\ 







Histogram: 38. Centering refinement: 17. CCV: 4. Successive refinement: 1. 

Figure 2: Example queries with their partner images, plus ranks under various methods. Lower ranks indicate 
better performance. 



99 



Histogram: 88. Centering refinement: 35. CCV: 20. Successive refinement: 13. 





Histogram: 310. Centering refinement: 214. CCV: 177. Successive refinement: 160. 



Histogram: 411. Centering refinement: 282. CCV: 84. Successive refinement: 56. 




Histogram: 50. Centering refinement: 37. CCV: 27. Successive refinement: 22. 




Figure 3: Additional example queries with ranks. Lower ranks indicate better performance. 
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into rectangular regions, where each region is pre- 
dominantly a single color. The partitioning algorithm 
makes use of maximum entropy. The similarity be- 
tween two images is the degree of overlap between re- 
gions of the same color. Hsu presents results from a 
database with 260 images, which show that their ap- 
proach can give better results than color histograms. 

While the authors do not report running times, 
it appears that Hsu's method requires substantially 
more computation than the approach we describe. A 
CCV can be computed in a single pass over the image, 
with a small number of operations per pixel. Hsu's 
partitioning algorithm in particular appears much 
more computationally intensive than our method. 
Hsu's approach can be extended to be independent 
of orientation and position, but the computation in- 
volved is quite substantial. In contrast, our method is 
naturally invariant to orientation and position. 

Rickman and Stonham [8] randomly sample pixel 
triples arranged in an equilateral triangle with a fixed 
side length. They use 16 levels of color hue, with non- 
uniform quantization. Approximately a quarter of the 
pixels are selected for sampling, and their method 
stores 372 bits per image. They report results from 
a database of 100 images. 

Smith and Chang's algorithm also partitions the 
image into regions, but their approach is more elabo- 
rate than Hsu s. They allow a region to contain multi- 
ple different colors, and permit a given pixel to belong 
to several different regions. Their computation makes 
use of histogram back-projection [12] to back-project 
sets of colors onto the image. They then identify color 
sets with large connected components. 

Smith and Chang's image database contains 3,100 
images. Again, running times are not reported, al- 
though their algorithm does speed up back-projection 
queries by pre-computing the back-projections of cer- 
tain color sets. Their algorithm can also handle cer- 
tain kinds of queries that our work does not address; 
for example, they can find all the images where the 
sun is setting in the upper left part of the image. 

Strieker and Dimai [10] compute moments for each 
channel in the HSV colorspace, where pixels close to 
the border have less weight. They store 45 floating 
point numbers per image. Their distance measure for 
two regions is a weighted sum of the differences in 
each of the three moments. The distance measure 
for a pair of images is the sum of the distance be- 
tween the center regions, plus (for each of the 4 side 
regions) the minimum distance of that region to the 
corresponding region in the other image, when rotated 
by 0, 90, 180 or 270 degrees. Because the regions over- 
lap, their method is insensitive to small rotations or 
translations. Because they explicitly handle rotations 
of 0, 90, 180 or 270 degrees, their method is not af- 
fected by these particular rotations. Their database 
contains over 11,000 images, but the performance of 
their method is only illustrated on 3 example queries. 
Like Smith and Chang, their method is designed to 
handle certain kinds of more complex queries that we 
do not consider. 



6 Extensions 

There are a number of ways in which our histogram 
refinement could be extended and improved. One 
eneralization is to further subdivide split histograms 
ased on additional features; we refer to this process 
as successive refinement Another extension centers 
on improving the choice of colorspace. 

6.1 Successive refinement 

In successive refinement the buckets in a split his- 
togram are further subdivided based on additional fea- 
tures. Much as we distinguish between pixels of sim- 
ilar color by coherence, we can distinguish between 
pixels of similar coherence by some additional feature. 
We can apply this method repeatedly; each refinement 
imposes an additional constraint on what it means for 
two pixels to be similar. 

We have implemented a simple successively refined 
histogram. A color histogram was first split with co- 
herence constraints (to create a CCV). Successive re- 
finement was enforced on both the coherent and in- 
coherent pixels of the CCV. We used the centering 
refinement introduced in section 3. With successive 
refinement, pixels are divided into four classes based 
on coherence versus incoherence, and on whether or 
not they were in the centermost 75% of the image. 
The L\ distance was used as a comparison measure. 
Examples of the successively refined histogram's per- 
formance are shown in figures 2 and 3. These prelim- 
inary results seem promising. 

We have also investigated successive refinement 
based on intensity gradients. Again, the initial re- 
finement was based on coherence, and the successive 
refinement was enforced identically on coherent and 
incoherent pixels. We have further classified pixels 
based on the gradient magnitude or on the gradient 
direction. The results we obtained are quite prelimi- 
nary, but they seem to indicate a statistically signifi- 
cant improvement over CCV's. 

The best system of constraints to impose on the 
image is an open issue. Any combination of features 
might give effective results, and there are many possi- 
ble features to choose from. However, it is possible to 
take advantage of the temporal structure of a succes- 
sively refined histogram. One feature might serve as 
a filter for another feature, by ensuring that the sec- 
ond feature is only computed on pixels which already 
possess the first feature. 

For example, the perimeter- to- area ratio can be 
used to classify the relative shapes of color regions. 
If we used this ratio as an initial refinement on color 
histograms, incoherent pixels would result in statisti- 
cal outliers, and thus give questionable results. This 
feature is better employed after the coherent pixels 
have been segregated. Refining a histogram not only 
makes finer distinctions between pixels, but functions 
as a statistical filter for successive refinements. 

6.2 Choice of colorspace 

Many researchers spend considerable effort on se- 
lecting a good set of colors. Hsu [5], for example, 
assumes that the colors in the center of the image 
are more important than those at the periphery, while 
Smith and Chang [9] use several different thresholds to 
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extract colors and regions. A wide variety of different 
colorspaces have also been investigated for content- 
based image retrieval, such as the opponent-axis col- 
orspace [12] and the Munsell colorspace [2]. 

The choice of colorspace is a particularly signifi- 
cant issue for CCVs, since they use the discretized 
color buckets to segment the image. A perceptually 
uniform colorspace, such as CIE Lab, should result in 
better segmentations and improve the performance of 
CCVs. A related issue is the color constancy prob- 
lem, which causes objects of the same color to ap- 
pear rather differently depending upon the lighting 
conditions. The simplest effect of color constancy is a 
change in overall image brightness; this is responsible 
for the negative examples obtained in our experiments 
with CCV's. Standard histogramming methods are 
sensitive to image gain. More sophisticated methods, 
such as color ratio histograms [3] or the use of color 
moments [10], might alleviate this problem. These 
methods, like most proposed improvements to color 
histograms, can also be used in histogram refinement. 
For example, color moments could be computed sepa- 
rately for coherent and incoherent pixels. 

7 Conclusions 

We have described a method for imposing addi- 
tional constraints on histogram based matching called 
histogram refinement. This idea can be extended by 
placing further constraints on the split histogram it- 
self. Both histogram refinement and successive refine- 
ment axe general methods for improving the perfor- 
mance of histogram based matching. If the initial his- 
togram is a color histogram, and it is refined based 
on coherence, then the resulting split histogram is a 
CCy. But there is no requirement that this refine- 
ment be based on coherence, or even that the initial 
histogram be based on color. 

Most research in content-based image retrieval has 
focused on query by example (where the system au- 
tomatically finds images similar to an input image). 
However, other types of queries are also important. 
For example, it is often useful to search for images 
in which a subset of another image (e.g. a particu- 
lar object) appears. This would be particularly useful 
for queries on a database of videos. One approach to 
this problem might be to generalize histogram back- 
projection [12] to separate pixels based on spatial co- 
herence, or some other local property. 

It is clear that larger and larger image databases 
will demand more complex similarity measures. This 
added time complexity can be offset by using efficient, 
coarse measures that prune the search space by remov- 
ing images which are clearly not the desired answer. 
Measures which are less efficient but more effective can 
then be applied to the remaining images. Baker and 
Nayar [1] have begun to investigate similar ideas for 
pattern recognition problems. To effectively handle 
large image databases win require a balance between 
increasingly fine measures (such as histogram refine- 
ment) and efficient coarse measures. 
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